The outline of this project are shown below.
I selected the StyleGAN model because its adaptive instance normalization (AdaIN) layers and improved network structures enable it to generate images with realistic and diverse features. In comparison to vanilla GAN, StyleGAN exhibits a superior ability to produce high-quality images.
I selected the w+ latent space as it represents the intermediate latent space, which provides finer-grained control over the generated images.
I opted for l1 as the loss_type. When compared to the l2 loss, l1 loss treats small and large errors equally. This means that l1 loss incentivizes the network to generate images with sharper edges and more pronounced features.
I selected a perc_wgt value of 0.01. A higher perceptual loss weight, such as perc_wgt = 0.1, tends to cause the generator to focus excessively on the reference image, leading to outputs that lack creativity. Conversely, using no perceptual loss (e.g., perc_wgt = 0) results in perceptually unrealistic and unappealing outputs.
Using a single RTX 3090 GPU, the vanilla GAN took 8.711 seconds to run, while StyleGAN's runtime ranged from 25.298 to 26.602 seconds, depending on the selected hyperparameters.
Now we have done the hyperparameter tuning, let's look at some visual results.
The outcomes of the interpolated gif experiments are presented below. The resulting gifs exhibit exceptional visual appeal, realism, and coherence.
Below, I present some results for the scribble to image task. While the output images resemble cats, they suffer from issues such as distortions, artifacts, and excessive use of blues. This task is challenging due to the difficulty of interpreting incomplete and ambiguous hand-drawn sketches and translating them into coherent images.
I utilized stable diffusion to generate a set of images based on text prompts, and the ensuing outcomes are presented below.
"A black cat scribble with a big smile"
"A brown cat scribble with an very angry looking"
"A white cat scribble with a big head and a curious looking"
I conducted image generation experiments on grumpy cat images with a resolution of 128 X 128, and the resulting images are displayed below.
Firstly, I conducted image generation experiments on grumpy cat images with a resolution of 256 X 256, and the ensuing results are presented below.
Additionally, I conducted image generation experiments on the Afhqcat dataset, which I present in the following results. It is worth noting that generating high-quality images on the Afhqcat dataset, which has a resolution of 512 X 512, is an challenging task.